Causal Emotion Entailment aims to identify causal utterances that are responsible for the target utterance with a non-neutral emotion in conversations. Previous works are limited in thorough understanding of the conversational context and accurate reasoning of the emotion cause. To this end, we propose Knowledge-Bridged Causal Interaction Network (KBCIN) with commonsense knowledge (CSK) leveraged as three bridges. Specifically, we construct a conversational graph for each conversation and leverage the event-centered CSK as the semantics-level bridge (S-bridge) to capture the deep inter-utterance dependencies in the conversational context via the CSK-Enhanced Graph Attention module. Moreover, social-interaction CSK serves as emotion-level bridge (E-bridge) and action-level bridge (A-bridge) to connect candidate utterances with the target one, which provides explicit causal clues for the Emotional Interaction module and Actional Interaction module to reason the target emotion. Experimental results show that our model achieves better performance over most baseline models. Our source code is publicly available at https://github.com/circle-hit/KBCIN.
translated by 谷歌翻译
Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At present, most mainstream solutions establish the mapping between views and shape of an object by assembling the networks of 2D encoder and 3D decoder as the basic structure while they adopt different approaches to obtain aggregation of features from several views. Among them, the methods using attention-based fusion perform better and more stable than the others, however, they still have an obvious shortcoming -- the strong independence of each view during predicting the weights for merging leads to a lack of adaption of the global state. In this paper, we propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall and propose a dynamic two-stage training strategy that can effectively adapt to all reconstructors with attention-based fusion. Experiments on ShapeNet verify that our method outperforms existing SOTA methods while the amount of parameters is far less than the same type of algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on maximizing diversity and discuss the cost-performance tradeoff of our model to achieve a better performance when facing heavy input amount and limited computational cost.
translated by 谷歌翻译
在不同情况下,已经探索了相对旋转和翻译估计任务的最小解决方案,通常依赖于所谓的共同可见度图。但是,如何在没有重叠的两个框架之间建立直接旋转关系仍然是一个公开主题,如果解决了,它可以大大提高视觉尾声的准确性。在本文中,提出了一种新的最小解决方案,以通过利用新的图形结构来求解两个图像之间没有重叠区域的相对旋转估计,我们将其称为扩展性图(E-Graph)。与共同可见度图不同,高级标志(包括消失方向和平面正常)存储在我们的电子图纸中,这些图形在几何上可扩展。基于电子图表,旋转估计问题变得更简单,更优雅,因为它可以处理纯粹的旋转运动,并且需要更少的假设,例如曼哈顿/亚特兰大世界,平面/垂直运动。最后,我们将旋转估计策略嵌入完整的相机跟踪和映射系统中,该系统获得了6-DOF相机姿势和密集的3D网格模型。对公共基准测试的广泛实验表明,所提出的方法实现了最新的跟踪性能。
translated by 谷歌翻译
假新闻在各个领域的社交媒体上广泛传播,这导致了政治,灾害和金融等许多方面的现实世界威胁。大多数现有方法专注于单域假新闻检测(SFND),当这些方法应用于多域假新闻检测时,导致不满意的性能。作为新兴领域,多域假新闻检测(MFND)越来越受到关注。但是,数据分布,例如词频率和传播模式,从域变化,即域移位。面对严重领域转变的挑战,现有的假新闻检测技术对于多域场景表现不佳。因此,要求为MFND设计专业型号。在本文中,我们首先为MFND设计了一个带有域名标签的假新闻数据集的基准,即Weibo21,由4,488个假新闻和来自9个不同领域的4,640个真实新闻组成。我们进一步提出了一种通过利用域门来聚合由专家混合提取的多个表示来聚合的多域假新闻检测模型(MDFend)。实验表明,MDFEND可以显着提高多域假新闻检测的性能。我们的数据集和代码可在https://github.com/kennqiang/mdfend-weibo21获得。
translated by 谷歌翻译
去耦时尚表示是指将空间和时间特征分解成尺寸无关的因素。尽管以前的基于RGB-D的运动识别方法通过紧密耦合的多模态时空表示来实现了有希望的性能,但由于紧密的时空缠绕的建模,它们仍然在小数据设置下遭受(i)优化困难;(ii)信息冗余通常包含与分类弱相关的大量边际信息; (iii)由晚期融合不足引起的多模态起峰型信息之间的低相互作用。为了缓解这些缺点,我们建议去除并循环基于RGB-D的运动识别的时空表示。具体而言,我们解开了学习时空表示的任务到3个子任务:(1)通过解耦的空间和时间建模网络学习高质量和维度独立特征。 (2)重新汇总解耦表示,以确定更强的时空依赖。 (3)引入跨型自适应后融合(CAPF)机制,用于从RGB-D数据捕获跨模态时空信息。这些新颖设计的无缝组合形成了强大的时空表示,而不是在四个公共运动数据集上的最先进的方法实现了更好的性能。我们的代码可在https://github.com/damo-cv/motionrgbd获得。
translated by 谷歌翻译
目前,下一个位置推荐在基于位置的社交网络应用程序和服务中起着重要作用。虽然已经提出了许多方法来解决这个问题,但到目前为止,三个重要挑战尚未得到很好的解决:(1)大多数现有方法基于经常性网络,这是耗费训练长期序列,因为不允许完整的平行度; (2)个性化偏好通常不被认为是合理的; (3)现有方法很少系统地研究了如何在轨迹数据中有效地利用各种辅助信息(例如,用户ID和时间戳)和非连续位置之间的时空关系。为了解决上述挑战,我们提出了一种名为SANMOVE的新型方法,是一种自我关注网络的模型,通过捕获用户的长期和短期移动模式来预测下一个位置。具体而言,SANMOVE引入了一个长期偏好学习模块,它使用自我关注模块来捕获用户的长期移动模式,可以代表用户的个性化位置偏好。同时,SanMove使用空间延伸的非侵入自我关注(Stnova)来利用辅助信息来学习短期偏好。我们使用两个真实世界数据集进行评估SANMOVE,并演示SANMOVE不仅比基于最先进的RNN的预测模型更快,而且还优于下一个位置预测的基线。
translated by 谷歌翻译
大多数行人轨迹预测方法都取决于大量的轨迹注释,这是耗时且昂贵的。此外,训练有素的模型可能无法有效地推广到另一台相机捕获的新场景。因此,希望将在注释源域上训练的模型调整到目标域。为了实现轨迹预测的域适应性,我们提出了跨域轨迹预测网络(CTP-NET)。在此框架中,在两个域中使用编码器来编码观察到的轨迹,然后它们的特征由跨域特征鉴别器对齐。此外,考虑到观察到的轨迹和预测轨迹之间的一致性,目标域偏移判别器被用来对抗对未来的轨迹预测进行对流规范,以与观察到的轨迹相符。广泛的实验证明了我们方法对行人轨迹预测的域适应性的有效性。
translated by 谷歌翻译
Stance detection models may tend to rely on dataset bias in the text part as a shortcut and thus fail to sufficiently learn the interaction between the targets and texts. Recent debiasing methods usually treated features learned by small models or big models at earlier steps as bias features and proposed to exclude the branch learning those bias features during inference. However, most of these methods fail to disentangle the ``good'' stance features and ``bad'' bias features in the text part. In this paper, we investigate how to mitigate dataset bias in stance detection. Motivated by causal effects, we leverage a novel counterfactual inference framework, which enables us to capture the dataset bias in the text part as the direct causal effect of the text on stances and reduce the dataset bias in the text part by subtracting the direct text effect from the total causal effect. We novelly model bias features as features that correlate with the stance labels but fail on intermediate stance reasoning subtasks and propose an adversarial bias learning module to model the bias more accurately. To verify whether our model could better model the interaction between texts and targets, we test our model on recently proposed test sets to evaluate the understanding of the task from various aspects. Experiments demonstrate that our proposed method (1) could better model the bias features, and (2) outperforms existing debiasing baselines on both the original dataset and most of the newly constructed test sets.
translated by 谷歌翻译
Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e.g., rain removal and super-resolution. Stereo image restoration methods usually obtain better performance than monocular methods by learning the disparity between dual views either implicitly or explicitly. However, existing stereo rain removal methods still cannot make full use of the complementary information between two views, and we find it is because: 1) the rain streaks have more complex distributions in directions and densities, which severely damage the complementary information and pose greater challenges; 2) the disparity estimation is not accurate enough due to the imperfect fusion mechanism for the features between two views. To overcome such limitations, we propose a new \underline{Stereo} \underline{I}mage \underline{R}ain \underline{R}emoval method (StereoIRR) via sufficient interaction between two views, which incorporates: 1) a new Dual-view Mutual Attention (DMA) mechanism which generates mutual attention maps by taking left and right views as key information for each other to facilitate cross-view feature fusion; 2) a long-range and cross-view interaction, which is constructed with basic blocks and dual-view mutual attention, can alleviate the adverse effect of rain on complementary information to help the features of stereo images to get long-range and cross-view interaction and fusion. Notably, StereoIRR outperforms other related monocular and stereo image rain removal methods on several datasets. Our codes and datasets will be released.
translated by 谷歌翻译
与卷积神经网络(CNN)相比,视觉变压器(VIT)表现出了有希望的性能,但是VIT的训练比CNN难得多。在本文中,我们定义了几个指标,包括动态数据比例(DDP)和知识同化率(KAR),以研究训练过程,并将其分为三个时期:形成,增长和探索。特别是,在训练的最后阶段,我们观察到只有很小的训练示例用于优化模型。鉴于VIT的数据渴望的性质,我们提出了一个简单但重要的问题:在培训的每个阶段,是否有可能提供丰富的``有效''培训示例吗?为了解决这个问题,我们需要解决两个关键问题,即\ ie,如何衡量单个培训示例的``有效性'',以及如何系统地生成足够数量的``有效''示例。为了回答第一个问题,我们发现训练样本的``困难''可以作为衡量培训样本的``有效性''的指标。为了解决第二个问题,我们建议在这些演化阶段动态调整训练数据的``难度''分布。为了实现这两个目的,我们提出了一个新颖的以数据为中心的VIT培训框架,以动态测量训练样本的``难度'',并为不同培训阶段的模型生成``有效的''样品。此外,为了进一步扩大``有效''样品的数量,并减轻了VIT的后期训练阶段的过度拟合问题,我们提出了一种称为Patcherasing的补丁级擦除策略。广泛的实验证明了提出的以数据为中心的VIT培训框架和技术的有效性。
translated by 谷歌翻译